Search CORE

5 research outputs found

Simulating realistic multiparty speech data: for the development of distant microphone ASR systems

Author: Deadman Jack
Publication venue
Publication date: 01/07/2023
Field of study

Automatic speech recognition has become a ubiquitous technology integrated into our daily lives. However, the problem remains challenging when the speaker is far away from the microphone. In such scenarios, the speech is degraded both by reverberation and by the presence of additive noise. This situation is particularly challenging when there are competing speakers present (i.e. multi-party scenarios) Acoustic scene simulation has been a major tool for training and developing distant microphone speech recognition systems, and is now being used to develop solutions for mult-party scenarios. It has been used both in training -- as it allows cheap generation of limitless amounts of data -- and for evaluation -- because it can provide easy access to a ground truth (i.e. a noise-free target signal). However, whilst much work has been conducted to produce realistic artificial scene simulators, the signals produced from such simulators are only as good as the `metadata' being used to define the setups, i.e., the data describing, for example, the number of speakers and their distribution relative to the microphones. This thesis looks at how realistic metadata can be derived by analysing how speakers behave in real domestic environments. In particular, how to produce scenes that provide a realistic distribution for various factors that are known to influence the 'difficulty' of the scene, including the separation angle between speakers, the absolute and relative distances of speakers to microphones, and the pattern of temporal overlap of speech. Using an existing audio-visual multi-party conversational dataset, CHiME-5, each of these aspects has been studied in turn. First, producing a realistic angular separation between speakers allows for algorithms which enhance signals based on the direction of arrival to be fairly evaluated, reducing the mismatch between real and simulated data. This was estimated using automatic people detection techniques in video recordings from CHiME-5. Results show that commonly used datasets of simulated signals do not follow a realistic distribution, and when a realistic distribution is enforced, a significant drop in performance is observed. Second, by using multiple cameras it has been possible to estimate the 2-D positions of people inside each scene. This has allowed the estimation of realistic distributions for the absolute distance to the microphone and relative distance to the competing speaker. The results show grouping behaviour among participants when located in a room and the impact this has on performance depends on the room size considered. Finally, the amount of overlap and points in the mixture which contain overlap were explored using finite-state models. These models allowed for mixtures to be generated, which approached the overlap patterns observed in the real data. Features derived from these models were also shown to be a predictor of the difficulty of the mixture. At each stage of the project, simulated datasets derived using the realistic metadata distributions have been compared to existing standard datasets that use naive or uninformed metadata distributions, and implications for speech recognition performance are observed and discussed. This work has demonstrated how unrealistic approaches can produce over-promising results, and can bias research towards techniques that might not work well in practice. Results will also be valuable in informing the design of future simulated datasets

White Rose E-theses Online

Scaling up pro-environmental agricultural practice using agglomeration payments: Proof of concept from an agent-based model

Author: Albers
Andrew Bell
Balliet
Bell
Benton
Brown
Concepción
Deadman
DeFries
Diederen
Donald
Drechsler
Drechsler
Drechsler
Edwards-Jones
Engel
FAO
Ficarelli
Gabriel
Garnett
German
Ghazoul
Grasmick
Gregory Parkhurst
Grimm
Grimm
Gómez-Baggethun
Hartig
Hoang
Hodgson
Jack
Kassam
Klaus Droppelmann
Kniveton
Kroeger
Langyintuo
Marenya
Marra
Mondal
Ndah
Palm
Pannell
Pannell
Parkhurst
Parkhurst
Parkhurst
Pretty
Sayer
Sci-kit Learn
Sutherland
Tilman
Tim G. Benton
Tscharntke
Vanclay
Ward
Wu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Phylogenetic relationships of sugarcane rust fungi

Author: BA Henricot
BA Roy
BL Chona
BT Egan
D Posada
DR Maddison
E Chavarria
EJ Butler
EV Virtudazo
EV Virtudazo
F Ronquist
J Felsenstein
Jack C. Comstock
JC Comstock
JF Hennen
JM Moncalvo
JP Huelsenbeck
JR Hernandez
KS Braithwaite
LH Purdy
Linley J. Dixon
Lisa A. Castlebury
M Lutz
M. Catherine Aime
MC Aime
ML Deadman
Neil C. Glynn
PM Kirk
R Magarey
R Vilgalys
W Maier
W Maier
W Ovalle
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Agent-based modeling of the effects of social norms on enrollment in payments for ecosystem services

Author: An
An
Andrés Viña
Bendor
Carley
Case
Chen
Chen
Chen
Cialdini
Cialdini
Cialdini
Coleman
Deadman
Delong
Dietz
Dietz
Elster
Epstein
Fehr
Ferraro
Foster
Frank Lupi
Franz
Fu
Goldstein
Gotts
Grimm
Grimm
Hanley
Jack
James
James
Janssen
Jaynes
Jianguo Liu
Li
Li An
Liang
Liu
Liu
Liu
Liu
Liu
Long
Louviere
Ma
Manski
Matthews
Myers
Nolan
OECD
Ostrom
Ostrom
Parker
Phillips
Ryan Sheely
Satake
Sengupta
Sidique
Siikamaki
Sobel
Vincent
Wang
Wiley
Wooldridge
Wunder
Wunder
Xiaodong Chen
Xu
Young
Young
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref